Analysis of data flows in complex enterprise it environments

ABSTRACT

The present technology enables identification, visualization, and analysis of data flows via network components in complex enterprise IT environments including but not limited to servers, workstations, switches, routers, wireless access points, traffic shapers, firewalls, storage systems and SAN systems. A method and system provide accessing collected information, filtering network connections from the information, identifying data flows over the filtered network connections, mapping the data flows to network components to find paths, marking the data flows and the network components associated with the paths with attributes, and displaying a networked computer environment including the filtered network connections, and the marked data flows and the marked network components associated with the paths.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of U.S. patent application Ser. No.14/217,244, filed on Mar. 17, 2014, the contents of which areincorporated herein by reference in their entirety.

BACKGROUND 1. Technical Field

The present invention relates to a computerized system and method foridentification, analysis, and visualization of data flows in complexenterprise IT environments.

2. Description of the Related Art

Enterprise Information Technology (IT) systems are complex. Varioushardware and software IT components depend on each other in a variety ofways. Data flows from one computer system to another or same computersystem via networking devices such as firewalls, routers, wirelessaccess points, switches, storage devices, and appliances. Documentationand knowledge about such data flow and its path is necessary for avariety of IT optimization, transformation, and audit projects. One ofsuch projects is the protection and security of data environments, whichdefinition, in turn, relates to creation and definition of firewallrules and security environments (also sometimes simply calledenvironments or affinity groups).

A number of security standards require proper documentation of dataflows of various kinds, typically in the form of graphical diagrams.Traditionally, such diagrams are created manually using diagram drawingsoftware. The information necessary to create data flow diagrams isprovided by the owners of IT assets. Manually generated diagrams depictwhat IT asset owners believe they know about the data flows. Often, someinformation in the diagrams may be missing or incorrect. In addition tothe errors due to information collection, extra errors may be introducedduring the manual drawing of the diagrams even if special diagramdrawing software is used. The number of network connections, data flows,and devices in the real enterprise IT environments is typically verylarge and dynamic and, therefore, the probability of a human error ishigh.

Some software systems that visualize computer system dependencies depictnetwork connections based on network connections monitoring or based onthe analysis of software configurations. This method, while automated,does not result in data flow diagrams but rather in the diagramsdepicting network connections between computer systems.

There are some tools that were designed to identify data flows fromspecific data not easily available in the real enterprise ITenvironments, some of such systems require specific APIs or otherintrusive instrumentation to be installed on network devices in order tobe functional. For example, J. Hizver and T. Chieh, Tracking paymentcard data flow using virtual machine state introspection, ACSAC' 11,wholly incorporated by reference as if fully set forth herein, requirehypervisor API usage, which is not applicable for physical and manyvirtual servers. Another difficulty with this solution is obtainingauthorization to use in the real enterprise IT environments. Other toolsattempt to infer transaction flow paths by analyzing the timing ofnetwork requests. This information is also not easily available in mostreal enterprise IT environments.

SUMMARY

The method described hereby enables identification, visualization, andanalysis of data flows via network components in complex enterprise ITenvironments, including but not limited to: servers, workstations,switches, routers, wireless access points, traffic shapers, firewalls,appliances, storage systems, and Storage Area Networking (SAN) systems.The technology relies on a combination of steps such as informationcollection, filtering, mapping, grouping, marking, report generation,and verification.

There is further presented a system for identification, visualization,and analysis of data flows, wherein the system includes a processor anda memory coupled to the processor. The memory stores a data flowidentification, visualization, and analysis tool, which is executed bythe processor.

In accordance with an embodiment of the present invention, there isdisclosed a computer-implemented method for identifying, visualizing,and analyzing a networked computer environment, wherein the methodincludes: accessing collected information about the networked computerenvironment, including network topology of network components andnetwork connections via a network topology graph, network componentdependencies, configurations, and attributes, software components,software objects, and data objects; filtering one or more of the networkconnections from the information collected based on the softwarecomponents, software objects, and data objects of certain types that areaccessed via the network connections, thus resulting in filtered networkconnections; identifying data flows between the software components,software objects, and data objects of the certain types over thefiltered network connections; mapping the data flows over the filterednetwork connections to the network components via the network topologygraph in order to find paths between the software components, softwareobjects, and data objects of the certain types, wherein each of thepaths includes a set of the data flows mapped to a respective set of thenetwork components; marking the set of the data flows included in eachof the paths with one or more attributes of the software components,software objects, and data objects of the certain types associated witheach of the paths, thus resulting in marked data flows associated witheach of the paths; marking the respective set of the network componentsincluded in each of the paths with the one or more attributes of themarked data flows, thus resulting in marked network componentsassociated with each of the paths; and displaying the networked computerenvironment including the filtered network connections, and the markeddata flows and the marked network components associated with each of thepaths.

In accordance with another embodiment of the present invention, there isdisclosed a computer-implemented system to identify, visualize, andanalyze a networked computer environment, wherein the system includes: aprocessing device; and a memory storing instructions that, when executedby the processing device, cause the processing device to performoperations including: accessing collected information about thenetworked computer environment, including network topology of networkcomponents and network connections via a network topology graph, networkcomponent dependencies, configurations, and attributes, softwarecomponents, software objects, and data objects; filtering one or more ofthe network connections from the information collected based on thesoftware components, software objects, and data objects of certain typesthat are accessed via the network connections, thus resulting infiltered network connections; identifying data flows between thesoftware components, software objects, and data objects of the certaintypes over the filtered network connections; mapping the data flows overthe filtered network connections to the network components via thenetwork topology graph in order to find paths between the softwarecomponents, software objects, and data objects of the certain types,wherein each of the paths includes a set of the data flows mapped to arespective set of the network components; marking the set of the dataflows included in each of the paths with one or more attributes of thesoftware components, software objects, and data objects of the certaintypes associated with each of the paths, thus resulting in marked dataflows associated with each of the paths; marking the respective set ofthe network components included in each of the paths with the one ormore attributes of the marked data flows, thus resulting in markednetwork components associated with each of the paths; and displaying thenetworked computer environment including the filtered networkconnections, and the marked data flows and the marked network componentsassociated with each of the paths.

These and other features and advantages will become apparent from thefollowing detailed description of illustrative embodiments thereof,which is to be read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosure will provide details in the following description ofpreferred embodiments with reference to the following figures wherein.In the figures, corresponding or like numbers or characters indicatecorresponding or like structures.

FIG. 1 is an example data flow diagram according to one embodiment ofthe present invention.

FIG. 2 is a system diagram for identifying, documenting, visualizing,and analyzing data flows according to one embodiment of the presentinvention.

FIG. 3 is a network topology (topology graph) diagram according to oneembodiment of the present invention.

FIG. 4 is a diagram showing network connections and their attributes,according to one embodiment of the present invention.

FIG. 5 is a block diagram showing elements of computer system suitablefor implementing methods as described herein.

DETAILED DESCRIPTION

Data is stored, processed, and transferred between computer systems.Documentation of such data and information flows is necessary for manypractical purposes including data security, security zones (securityenvironments, environments, affinity groups) organizations, audits andfirewall rules optimization. Moreover, for practical security audit andsecurity environment design purposes it is necessary to differentiatebetween the types of data flows. For example, data related to creditcard information should be better protected than most other types ofdata. In order to properly secure all software and hardware systems thatthe data is flowing through, it is necessary to discover and document(e.g., typically in the form of diagrams) all such hardware and softwaresystems (including the security environments).

FIG. 1 shows an example data flow diagram and relevant hardware andsoftware systems. There are two data flows 110 and 111 from a group ofuser workstations 100 (e.g., workstations 100 are grouped into a groupnamed “Usernet” shown in FIG. 1 ). Data flow 110 is going via wirelessaccess point 101, firewall 102, and router 103 to server 104. Data flow111 is going via router 103 to server's 104 Web Server Profile 120. FromWeb Server Application 123 of Web Server Profile 120 data flows furthervia data flow 112 to server 105. Another group of workstations 106(e.g., workstations 106 are grouped into a group named “DMZ” shown inFIG. 1 ) has a data flow 113 to a database with Credit Cards data 121.Data flow 113 is depicted with dashed lines to differentiate it fromdata flows 110, 111, and 112 that do not carry credit card data.Information that data flow 113 is related to credit card details is adata flow attribute.

As shown in FIG. 2 , the present technology relies on informationcollection 201, filtering 202, mapping 203, grouping 204, and marking205 via a data flows analysis manager 200. If the resulting data flowsinformation (e.g., diagrams and other reports) 210 do not pass manual orautomated inspection (also known as verification) 211, the analysis maybe repeated via the data flows analysis manager 200. Below we explainthese possible steps in greater detail.

Information about the network connections and component dependencies inthe enterprise IT environments as well as information about the networktopology can be collected using a variety of tools and methods, and canbe accessed for analysis by the data flows analysis manager 200. Forexample, modem switches support mechanisms to monitor and collectinformation about the network connections. Some tools collectinformation about the network connections and computer system componentdependencies by analyzing software configurations or observing networkconnections on the computer systems. Computer system inventory discoverysystems capture information about computer systems and their attributesand configurations. Nikolai Joukov, Birgit Pfitzmann, HariGovind V.Ramasamy, Murthy Devarakonda, “Application-Storage Discovery”, SYSTOR2010, wholly incorporated by reference as if fully set forth herein,describes an example of the computer system inventory and dependencydiscovery system. Network topology discovery tools and methods typicallyrely on sending out probing requests and analyzing replies. BruceLowekamp, David R. O'Hallaron, and Thomas R. Gross “Topology Discoveryfor Large Ethernet Networks”, SIGCOMM 2001, wholly incorporated byreference as if fully set forth herein, describes an example of thenetwork topology discovery tool. Data collection tools or devices can beused with or without modifications and augmentations to collect moreinformation for the purposes of data flows analysis. One example of theaforementioned augmentation is collection of network connection relatedinformation from configuration files of software installations oncomputer systems.

In general, information about the network topologies, networkconnections and network component dependencies, as well as inventory ofcomputer systems, their software components, configurations andattributes, classification and attributes of data objects and flows mayeither be collected using tools, devices, manually, via interviewingpersonnel, collected from existing configuration management databases,and any combination thereof. This step of collecting information isdepicted as reference no. 201 in FIG. 2 . Software installations, dataobjects, their configurations and attributes, subnetworks, securityzones, and all types of groups of network components are included in theterm “network components” herein. Files, directories, databases, tables,columns, queues, application modules, URLs, jobs, disks, diskpartitions, are some examples of data objects.

It should be noted that some information may be inferred from otherinformation during the data collection phase. For example, a networktopology diagram may be extended with the network components that werenot originally present but can be inferred from the information aboutnetwork connections. Remote servers 316 and 317 shown in FIG. 3 , forexample, may not be present in the existing network diagrams but theymay be added to the network topology diagram if the set of networkconnections includes connections to such servers.

FIG. 3 shows one of the possible examples of network topology diagrams.User workstations 311 and 312 have a physical network link (e.g., withEthernet cables) with switch 320 and a wireless link with access point101. Remote servers 316 and 317 are connected to the networkingenvironment depicted in the network topology diagram via Internet 330and access point 101. Access point 101 is physically linked to firewall102. Firewall 102 is physically linked to router 103. Router 103 isphysically linked to switches 310 and 320. DMZ workstations 313, 314,and 315 and servers 104 and 105 are physically linked to switch 310.Server 105 and SAN Device 341 are physically linked to SAN switch 340.

FIG. 4 depicts an environment related to network connections. Networkconnection 401 is initiated from Web Application 123 on server 104 todatabase 122 on server 105. Network connection 401 is established fromIP address 10.1.1.2 (referenced at 410), port 90000 (referenced at no.414) to IP address 10.1.1.3 (referenced at no. 411), port 50000(referenced at no. 415).

Not all collected information is relevant and necessary for the dataflow analysis. For example, connections to port 53 of DNS servers,connections between server monitoring or management software may befiltered out from the collected data because they may not be necessaryfor the analysis of the data flows and they may not correspond to dataflows. Similarly, some management devices or servers or managementsoftware on the servers that is known not to be part of data flows canbe removed from the input information. The filtering process may resultin a dramatic simplification of the information for analysis, sometimesreducing the number of connections by orders of magnitude. The filteringmay be based on many types of rules including but not limited tofiltering out specific types of software and network connections to thatsoftware, filtering out based on network device or computer system type,filtering out based on connection ports, accessed objects, or anycombinations thereof. A typical example of connection filtering based onaccessed objects is filtering of connections to shared folders:connections to inter-process communication share named “IPC$” (e.g.,referenced at no. 422 in FIG. 4 ) may be filtered out, in most cases,while connections to shares with other names (e.g., network connection402 to shared folder “Data$” referenced at no. 423 in FIG. 4 ) may berelated to data flows and may be left in the information set for furtheranalysis. Filtering step 202 is shown in FIG. 2 .

Network connections and dependencies are defined between servers,workstations, clusters of servers, other devices, software components,and software objects (data objects). For example, a simple connectionmay be defined between two IP addresses: source and destination. Networktopology graphs, like the one shown in FIG. 3 , contain informationabout immediate connections between computer systems. In order togenerate diagrams similar to the one depicted in FIG. 1 it may benecessary to map network connections to the network topology nodes (suchas servers, routers, workstations, storage devices, switches, firewalls,access points, and so forth) and network links (e.g., Ethernet cables)between them. This mapping is shown as step 203 in FIG. 2 .

There are many ways to map network connections and other types ofdependencies to network topology diagrams. For example, it is possibleto use a standard depth-first graph search algorithm. In other words,for every network connection or dependency from network component A tonetwork component B one may try to find all paths from A to B via thenetwork topology graph. A more specific example: 1) for A and B findcorresponding network components in the network topology graph (byfinding matching attributes such as IP addresses); 2) start from networkcomponent A on the network topology graph (assume that current networkcomponent is A); 3) from the current network component follow existingnetwork links to reach adjacent network components; if an adjacentnetwork component was already visited on the way from A try to look foranother way from A to B, if an adjacent network component was notvisited then repeat step 3 for the adjacent network component, and ifthe adjacent network component is B then record the discovered pathincluding the network components, links, and network connections. Lookfor paths from A to B until all possible paths are tried. It should benoted that several paths from A to B may be possible. For example, thereare two paths for a network connection from workstation 311 to server104 as shown in FIG. 3 . Thus, the search for paths between workstation311 and server 104 will find path1 via network components 311, 101, 102,103, 310, 104 and path2 via network components 311, 320, 103, 310, 104.Path1 and path2 also correspond to data flows 110 and 111 respectively,as shown in FIG. 1 .

The decision to follow a network component or a network link may befurther enhanced by analyzing configurations of the network components.For example, if a firewall rule blocks network connections from A and B,there may be no reason to map the corresponding path from A to B via thefirewall. However, even such paths may be useful for data flow analysis(e.g., to analyze how data would flow without a firewall or without afirewall rule). Similarly, it is possible to analyze routerconfigurations on the network components (any network component may haverouting rules) and follow only the paths that comply with router rules.For example, there may be no reason to map path1 (e.g., data flow 110shown in FIG. 1 ) if workstation 311 has a routing rule that directs alltraffic to router 103 if the network link via switch 320 exists. In thiscase, the traffic will be routed via path2 (e.g., data flow 111 shown inFIG. 1 ). Firewall configuration and routing information can becollected in step 201 shown in FIG. 2 from the network systems, forexample, by reading configuration files or issuing commands (such as“route”). Filtering out some data flow paths based on routing orfirewall configurations is one of the forms of filtering (e.g.,filtering step 202 shown in FIG. 2 ).

There may be hundreds of thousands of workstations in a large company.There may be billions of network components in the Internet with many ofthem communicating with the network environment being analyzed. Ingeneral, there may be a need to group related network componentstogether, in order to be able to efficiently analyze them and presentresults using visual diagrams. Step 204 shown in FIG. 2 depicts thegrouping step.

One method to identify and group related network nodes together is basedon matching rules. A matching rule may state that network componentswith a given name, IP address, or those that belong to a specificsubnetwork should be grouped into a group. This formed group may have aname. A set of rules may be created in advance for common internet andcloud services with known IP address ranges. Such rules may be reusedfor different network environments and different companies. For example,servers 316 and 317 may have fixed IP addresses and belong to a knowncloud service. In practical situations, it makes sense to group severalsuch servers together into a group with an easy to recognize name. Otherrules are built for a specific client environment. For example,workstations 311, 312, and 313 shown in FIG. 3 belong to a set of IPaddress ranges reserved for user workstations. Therefore, suchworkstation IP addresses would match a rule for user workstations basedon IP addresses and may be grouped into a group named “Usernet” (e.g.,workstations 100 grouped into the group Usernet as shown in FIG. 1 ).The rules may include a combination of other conditions and attributes.For example, a grouping rule may state that network components thatbelong to a given subnetwork and have a workstation (or non-server) typeof operating system should be grouped into the workstations group.Another example of a network component-grouping rule is the rule togroup wireless access points based on a set of attributes such asmatching wireless network ID.

One or multiple network connections may be grouped into groups ofnetwork connections. For example, connections from the same softwarecomponent on one server to another software component on a server may begrouped together even if any other attributes or objects that belong tothe aforementioned software components that are connected by theaforementioned network connections differ. Network connections andnetwork component grouping rules themselves may have a variety of syntaxforms, including XML and SQL syntax, or may be implemented as a codefragment as part of a computer program.

Various network component connections and dependencies may correspond todata flows of different types. In addition, data flow direction may bethe same or different as the direction of a corresponding networkconnection or other dependency. A data flow may also be bidirectional ordirection may be unspecified. Therefore, it is important to 1) mapnetwork connections or other dependencies to data flows, 2) assign dataflow attributes such as data flow criticality and direction, and 3) mapdata flow attributes to network components. These operations are shownas marking 205 in FIG. 2 .

There are a variety of ways to map network connections and othercomponent dependencies to data flows. For example, one may assume thateach network connection that is not filtered out corresponds to a dataflow. Yet another way is to assume that connections correspond to dataflows based on matching rules. A matching rule may be applied to variousconfiguration elements and attributes. For example, specific types ofsoftware are known to establish connections that are data flows. Inorder to further differentiate connections that correspond to data flowsor certain types of data flows, it is possible to use the rules that mapconnections to data flows based on configuration elements andattributes. (For example, software configuration files or otherconfiguration elements may have information about the target server nameor IP address and port number that correspond to data flows. It shouldbe noted that such data flows may or may not correspond to networkconnections discovered during the information collection step.) Inaddition, mapping of connections to data flows may be performed byinterviewing people to filter or augment data flows or deriving extrainformation from existing documentation. Identification of data flows isthe mapping of network connections or dependencies or other kinds ofcollected information to data flows.

Data flows carry various types of data and different types of data maybe treated and analyzed differently. For example, data flows with creditcard information are subject to audit and rigorous treatment based onspecialized security standards. There are many ways to assign data flowattributes (e.g., type, direction, and criticality) to data flows. Thisassignment may be performed by interviewing or otherwise requestinginput for data flows from people (typically information technologypersonnel). Attributes may be assigned based on existing documentation.Attributes may be assigned based on rules. For example, a rule may beused to detect that database 121 (named “Credit Cards”) containsinformation of credit cards either based on the database name or basedon the names of the database columns or database data in its tables: ifany data field matches a regular pattern (e.g., a credit card numberpattern) the database is marked as a database with credit card data.Network connections and dependencies to such network components may bemarked as credit card related. For example, let us assume that database121 in FIG. 1 was detected to have credit card data based on a data typerule consisting of a credit card number regular pattern that matchedagainst some database data. As a result, we can mark data flow 113 inFIG. 1 as credit card related data flow because the data flow isconnected to database 121 that is known to be related to credit carddata.

Software tools or hardware appliances that inspect the data may be usedto identify the type and other attributes of data flows. For example,information from Data Loss Prevention (DLP) systems often deployed onsome network links may provide information about the type of somenetwork connections and data flows such as data flows with credit carddata.

A data flow may be directed via a path of network components of variouskinds. These network components if malfunctioning or compromised bymalicious users may disrupt a data flow or allow intruders to observethe data in the data flow. Therefore, such network components should beidentified and treated appropriately. One of the ways to identifynetwork components responsible for a data flow is to mark each networkcomponent on the data flow path as a component that carries the dataflow. For example, data flow 110 shown in FIG. 1 depends on networkcomponents 100, 101, 102, 103, and 104. It should be noted that thegranularity of this marking and assignment of data flow attributes tonetwork components may vary. For example, the whole server 104 may beassigned to the data flow or only Web Server Profile 120 shown in FIG. 1.

As will be appreciated by a person skilled in the art, aspects of thepresent invention may be embodied as a method, system, or a computerprogram. Thus, aspects of the present invention may take the form of ahardware embodiment, a software embodiment, or an embodiment combiningsoftware and hardware, as well a computer program embodied in one ormore computer readable medium(s). A computer readable medium may be anytangible medium that can contain or store a program for use by or inconnection with an instruction execution system or device.

FIG. 5 depicts elements of a computer system suitable for implementingmethods as described herein. The computer system includes processingunit 1 with one or more cores, memory, and may contain local storage inthe form of a hard disk, flash disk, or other storage medium, or mayhave remotely accessible storage 3 or other components necessary toexecute a computer program. It is noted that the computer system maycontain more processing units in the form of servers or workstations 2or other units capable of executing instructions such as printers,routers, switches, firewalls, storage controllers, special purposenetworking equipment, and other units 4. The processing units may beinterconnected via wired and/or wireless connections.

It should be understood that the terms “includes”, “include”,“including”, “comprises”, “comprise”, and “comprising” in this documentspecify the presence of the stated features, components, operations, andsteps but do not preclude the presence of other features, components,operations, and steps. Articles “a”, “an”, and “the” are intended toinclude plural forms as well unless the context clearly statesotherwise. The terminology used in this invention is for the purpose ofdescribing a particular embodiment and is not intended to limit theinvention.

As used herein, it is understood that the terms “program code” and“computer program code” are synonymous and mean any expression, in anylanguage, code or notation, of a set of instructions intended to cause acomputing device having an information processing capability to performa particular function either directly or after either or both of thefollowing: (a) conversion to another language, code or notation; and/or(b) reproduction in a different material form.

To this extent, program code can be embodied as one or more of:application/software program, component software/a library of functions,an operating system, a basic device system/driver for a particularcomputing and/or device, and the like.

A data processing system suitable for storing and/or executing programcode can be provided hereunder and can include at least one processorcommunicatively coupled, directly or indirectly, to memory elementsthrough a system bus. The memory elements can include, but are notlimited to, local memory employed during actual execution of the programcode, bulk storage, and cache memories that provide temporary storage ofat least some program code in order to reduce the number of times codemust be retrieved from bulk storage during execution. Input/output ordevice devices (including, but not limited to, keyboards, displays,pointing devices, etc.) can be coupled to the system either directly orthrough intervening device controllers. It is inherent herein that thepresent invention is tied to at least one system (e.g., firewall 102),and/or transforms at least one article (e.g., firewall rules, etc.)and/or data representative of one article (e.g., data flow).

Network adapters also may be coupled to the system to enable the dataprocessing system to become coupled to other data processing systems,remote printers, storage devices, and/or the like, through anycombination of intervening private or public networks. Illustrativenetwork adapters include, but are not limited to, modems, cable modemsand Ethernet cards.

The foregoing description of various aspects of the invention has beenpresented for purposes of illustration and description. The descriptionis not intended to be exhaustive or to limit the invention to theprecise form disclosed, and obviously, many modifications and variationsare possible. Such modifications and variations that may be apparent toa person skilled in the art are intended to be included within the scopeof the invention as defined by the accompanying claims.

1. A computer-implemented method for identifying, visualizing, andanalyzing a networked computer environment, the method comprising:accessing collected information about the networked computerenvironment, including network topology of network components andnetwork connections via a network topology graph, network componentdependencies, configurations, and attributes, software components,software objects, and data objects; filtering one or more of the networkconnections from the information collected based on the softwarecomponents, software objects, and data objects of certain types that areaccessed via the network connections, thus resulting in filtered networkconnections; identifying data flows between the software components,software objects, and data objects of the certain types over thefiltered network connections; mapping the data flows over the filterednetwork connections to the network components via the network topologygraph in order to find paths between the software components, softwareobjects, and data objects of the certain types, wherein each of thepaths includes a set of the data flows mapped to a respective set of thenetwork components; marking the set of the data flows included in eachof the paths with one or more attributes of the software components,software objects, and data objects of the certain types associated witheach of the paths, thus resulting in marked data flows associated witheach of the paths; marking the respective set of the network componentsincluded in each of the paths with the one or more attributes of themarked data flows, thus resulting in marked network componentsassociated with each of the paths; and displaying the networked computerenvironment including the filtered network connections, and the markeddata flows and the marked network components associated with each of thepaths.
 2. The computer-implemented method according to claim 1, whereinthe method further comprises collecting the information about thenetworked computer environment, including network topology of networkcomponents and network connections via the network topology graph,network component dependencies, configurations, and attributes, softwarecomponents, software objects, and data objects.
 3. Thecomputer-implemented method according to claim 1, wherein finding pathsbetween the software components, software objects, and data objects viathe network topology graph is performed using a depth-first graph searchalgorithm.
 4. The computer-implemented method according to claim 1,wherein finding paths between the software components, software objects,and data objects further comprises: analyzing routing and firewall rulesof the network components as filtered; and excluding network connectionsas filtered from mapping of the data flows according to the routing andfirewall rules.
 5. The computer-implemented method according to claim 1,wherein the method further comprises: defining and organizing dataenvironments, security environments, and security zones to include themarked network components; defining firewalls for the data environments,security environments, and security zones; and displaying the networkedcomputer environment including the data environments, securityenvironments, and security zones, filtered network connections, and themarked data flows and the marked network components associated with eachof the paths.
 6. A computer-implemented system to identify, visualize,and analyze a networked computer environment, the system comprising: aprocessing device; and a memory storing instructions that, when executedby the processing device, cause the processing device to performoperations comprising: accessing collected information about thenetworked computer environment, including network topology of networkcomponents and network connections via a network topology graph, networkcomponent dependencies, configurations, and attributes, softwarecomponents, software objects, and data objects; filtering one or more ofthe network connections from the information collected based on thesoftware components, software objects, and data objects of certain typesthat are accessed via the network connections, thus resulting infiltered network connections; identifying data flows between thesoftware components, software objects, and data objects of the certaintypes over the filtered network connections; mapping the data flows overthe filtered network connections to the network components via thenetwork topology graph in order to find paths between the softwarecomponents, software objects, and data objects of the certain types,wherein each of the paths includes a set of the data flows mapped to arespective set of the network components; marking the set of the dataflows included in each of the paths with one or more attributes of thesoftware components, software objects, and data objects of the certaintypes associated with each of the paths, thus resulting in marked dataflows associated with each of the paths; marking the respective set ofthe network components included in each of the paths with the one ormore attributes of the marked data flows, thus resulting in markednetwork components associated with each of the paths; and displaying thenetworked computer environment including the filtered networkconnections, and the marked data flows and the marked network componentsassociated with each of the paths.
 7. The computer-implemented systemaccording to claim 6, wherein the operations further comprise collectingthe information about the networked computer environment, includingnetwork topology of network components and network connections via thenetwork topology graph, network component dependencies, configurations,and attributes, software components, software objects, and data objects.8. The computer-implemented system according to claim 6, wherein findingpaths between the software components, software objects, and dataobjects via the network topology graph is performed using a depth-firstgraph search algorithm.
 9. The computer-implemented system according toclaim 6, wherein finding paths between the software components, softwareobjects, and data objects further comprises; analyzing routing andfirewall rules of the network as filtered; and excluding networkconnections as filtered from mapping of the data flows according to therouting and firewall rules.
 10. The computer-implemented systemaccording to claim 6, wherein the operations further comprise: definingand organizing data environments, security environments, and securityzones to include the marked network components; defining firewalls forthe data environments, security environments, and security zones; anddisplaying the networked computer environment including the dataenvironments, security environments, and security zones, filterednetwork connections, and the marked data flows and the marked networkcomponents associated with each of the paths.